Optical Character Recognition of Heavily Distorted Text Segments

نویسنده

  • Alexander Motzek
چکیده

The fact that artificial text segments can be generated, which are recognizable by humans but not by artificial intelligence, shows that research in the field of artificial intelligence in the range of optical character recognition is not as advanced as it could be. Such artificial text segments serve as human interaction proofs in the field of computer security mechanisms, also known as CAPTCHAs. Taking one of the most used CAPTCHAs—the reCaptcha system—as a guideline for heavily distorted text segments, this thesis pursues the goal of developing a robust recognition algorithm, which is easily adaptable to other systems and use cases. In the scope of this thesis a strong character classifier based on Fourier descriptors is developed. The test results indicate that the proposed algorithm outperforms current state-of-the-art character recognition systems in single character recognition with a mean recognition rate of % vs. %. Furthermore, with a recognition rate of % it is able to break the reCaptcha system and beat human recognition performance by nearly %. On top of that, it is shown to be adaptable to other CAPTCHA systems and, despite being developed for character recognition, also to numbers. The complete algorithm solely depends on pre-rendered instances of characters without any knowledge or instances of applied distortions and is able to recognize and separate merged characters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OCR for printed Kannada text to Machine editable format using Database approach

This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...

متن کامل

Localization and Recognition of Text with Perspective Distortion in Natural Scenes

Recognizing text in natural scene images refers to the problem of identifying words that present on it. Scene text recognition is very difficult due to some reasons such as, images contain very little amount of linguistic context, interpreting versions of letters and digits are required for scene text recognition and also scene text can appear in any orientation. Most of the existing works are ...

متن کامل

A Model of On-line Handwritten Japanese Text Recognition Free from Line Direction and Writing Format Constraints

This paper presents a model and its effect for on-line handwritten Japanese text recognition free from line-direction constraint and writing format constraint such as character writing boxes or ruled lines. The model evaluates the likelihood composed of character segmentation, character recognition, character pattern structure and context. The likelihood of character pattern structure considers...

متن کامل

Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition

In this paper, we present a Character-Aware Neural Network (Char-Net) for recognizing distorted scene text. Our CharNet is composed of a word-level encoder, a character-level encoder, and a LSTM-based decoder. Unlike previous work which employed a global spatial transformer network to rectify the entire distorted text image, we take an approach of detecting and rectifying individual characters....

متن کامل

Detection and Extraction of Text Connected to Graphics in Maps

The separation of text from graphics has been challenging researchers for many years. The difficulty arises when there is text connected to graphics. This paper proposes a specific method of detecting and extracting graphics-connected characters. The proposed method is based on the observation that the constituent strokes of characters are usually short segments in comparison with those of grap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014